Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add blob.open() for file-like I/O #385

Merged
merged 9 commits into from
Mar 24, 2021
Merged

feat: add blob.open() for file-like I/O #385

merged 9 commits into from
Mar 24, 2021

Conversation

andrewsg
Copy link
Contributor

Fixes #29

@andrewsg andrewsg requested a review from a team February 22, 2021 02:24
@andrewsg andrewsg requested a review from a team as a code owner February 22, 2021 02:24
@google-cla google-cla bot added the cla: yes This human has signed the Contributor License Agreement. label Feb 22, 2021
@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Feb 22, 2021
@andrewsg
Copy link
Contributor Author

Docstrings are WIP and system tests should be expanded before merging.

@andrewsg andrewsg added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Feb 22, 2021
@andrewsg andrewsg requested review from crwilcox and frankyn February 22, 2021 18:18
Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some feedback that I want to talk through in implementation. Otherwise this is looking good. I'll review the design document as well.

google/cloud/storage/blob.py Show resolved Hide resolved

# Upload chunks. The SlidingBuffer class will manage seek position.
for _ in range(num_chunks):
upload.transmit_next_chunk(transport)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if partial success occurs does transmit_next_chunk handle the retry from the remote offset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define "partial success"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GCS may receive a subset of bytes from client upload, leaving the client and server next byte offset unaligned. A request to get current GCS offset can be made per Step Checking the status of a resumable upload.

If the client doesn't attempt to recover from this state, the inconsistent byte alignment will cause a 400 error which is not recoverable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. With the changes in this recent update, sliding buffer implements seek() so this feature is as supported as it's going to be in python-storage. If we want further support we need to flesh it out in resumable media.

Copy link
Member

@frankyn frankyn Mar 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do that for recovering in a mismatch (address it in resumable media repo). Could you file a tracking issue in that repo?

google/cloud/storage/fileio.py Outdated Show resolved Hide resolved
@andrewsg andrewsg removed the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Mar 20, 2021
@andrewsg
Copy link
Contributor Author

Added docstrings, seek functionality in the sliding buffer, and other features. PTAL. Remaining questions: can we set up a system test to test end-to-end the error-on-chunk-upload-and-retry behavior? And, even though blob.download methods do not populate the blob generation, can we fetch that generation information from download metadata somehow in order to implement generation lock? (alternative: force blob.reload() before d/l if generation is not populated - race condition possible).

@frankyn
Copy link
Member

frankyn commented Mar 22, 2021

Remaining questions:

  1. Can we set up a system test to test end-to-end the error-on-chunk-upload-and-retry behavior?
  • Not without GCS emulator. You'd be better off mocking out the API request at this time.
  1. Even though blob.download methods do not populate the blob generation, can we fetch that generation information from download metadata somehow in order to implement generation lock? (alternative: force blob.reload() before d/l if generation is not populated - race condition possible).
  • You can get generation information from response headers. A similar issue came up in Node.js (for checksums) and here's example of responses specifically the header is named X-goog-generation:.

Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Over LGTM, I have a few nits.

Also retrying on mismatched offset may be better handled / addressed in resumable media than in this wrapper so we could table it for now and open tracking issues in resumable media repo. WDYT?

google/cloud/storage/blob.py Outdated Show resolved Hide resolved
google/cloud/storage/blob.py Show resolved Hide resolved
google/cloud/storage/blob.py Outdated Show resolved Hide resolved
google/cloud/storage/blob.py Show resolved Hide resolved
Copy link
Member

@frankyn frankyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚢

@andrewsg andrewsg added the automerge Merge the pull request once unit tests and other checks pass. label Mar 24, 2021
@gcf-merge-on-green gcf-merge-on-green bot merged commit 440a0a4 into master Mar 24, 2021
@gcf-merge-on-green gcf-merge-on-green bot deleted the fileio branch March 24, 2021 19:06
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Mar 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. cla: yes This human has signed the Contributor License Agreement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an API method to give us a streaming file object
2 participants